Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Front Pharmacol ; 12: 601511, 2021.
Article in English | MEDLINE | ID: mdl-33633572

ABSTRACT

Gene-set analysis is commonly used to identify trends in gene expression when cells, tissues, organs, or organisms are subjected to conditions that differ from those within the normal physiological range. However, tools for gene-set analysis to assess liver and kidney injury responses are less common. Furthermore, most websites for gene-set analysis lack the option for users to customize their gene-set database. Here, we present the ToxPanel website, which allows users to perform gene-set analysis to assess liver and kidney injuries using activation scores based on gene-expression fold-change values. The results are graphically presented to assess constituent injury phenotypes (histopathology), with interactive result tables that identify the main contributing genes to a given signal. In addition, ToxPanel offers the flexibility to analyze any set of custom genes based on gene fold-change values. ToxPanel is publically available online at https://toxpanel.bhsai.org. ToxPanel allows users to access our previously developed liver and kidney injury gene sets, which we have shown in previous work to yield robust results that correlate with the degree of injury. Users can also test and validate their customized gene sets using the ToxPanel website.

2.
PLoS One ; 12(12): e0188461, 2017.
Article in English | MEDLINE | ID: mdl-29216202

ABSTRACT

Certain occupational and geographical exposures have been associated with an increased risk of lung disease. As a baseline for future studies, we sought to characterize the upper respiratory microbiomes of healthy military personnel in a garrison environment. Nasal, oropharyngeal, and nasopharyngeal swabs were collected from 50 healthy active duty volunteers eight times over the course of one year (1107 swabs, completion rate = 92.25%) and subjected to pyrosequencing of the V1-V3 region of 16S rDNA. Respiratory bacterial taxa were characterized at the genus level, using QIIME 1.8 and the Ribosomal Database Project classifier. High levels of Staphylococcus, Corynebacterium, and Propionibacterium were observed among both nasal and nasopharyngeal microbiota, comprising more than 75% of all operational taxonomical units (OTUs). In contrast, Streptococcus was the sole dominant bacterial genus (approximately 50% of all OTUs) in the oropharynx. The average bacterial diversity was greater in the oropharynx than in the nasal or nasopharyngeal region at all time points. Diversity analysis indicated a significant overlap between nasal and nasopharyngeal samples, whereas oropharyngeal samples formed a cluster distinct from these two regions. The study produced a large set of pyrosequencing data on the V1-V3 region of bacterial 16S rDNA for the respiratory microbiomes of healthy active duty Service Members. Pre-processing of sequencing reads showed good data quality. The derived microbiome profiles were consistent both internally and with previous reports, suggesting their utility for further analyses and association studies based on sequence and demographic data.


Subject(s)
Microbiota , Military Personnel , Respiratory System/microbiology , Corynebacterium/genetics , Corynebacterium/isolation & purification , DNA, Ribosomal/genetics , Female , Humans , Male , Nasal Cavity/microbiology , Nasopharynx/microbiology , Propionibacterium/genetics , Propionibacterium/isolation & purification , RNA, Ribosomal, 16S/genetics , Staphylococcus/genetics , Staphylococcus/isolation & purification
3.
Front Pharmacol ; 8: 889, 2017.
Article in English | MEDLINE | ID: mdl-29255418

ABSTRACT

In drug development, early assessments of pharmacokinetic and toxic properties are important stepping stones to avoid costly and unnecessary failures. Considerable progress has recently been made in the development of computer-based (in silico) models to estimate such properties. Nonetheless, such models can be further improved in terms of their ability to make predictions more rapidly, easily, and with greater reliability. To address this issue, we have used our vNN method to develop 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction models. These models quickly assess some of the most important properties of potential drug candidates, including their cytotoxicity, mutagenicity, cardiotoxicity, drug-drug interactions, microsomal stability, and likelihood of causing drug-induced liver injury. Here we summarize the ability of each of these models to predict such properties and discuss their overall performance. All of these ADMET models are publically available on our website (https://vnnadmet.bhsai.org/), which also offers the capability of using the vNN method to customize and build new models.

4.
PLoS Negl Trop Dis ; 11(2): e0005395, 2017 02.
Article in English | MEDLINE | ID: mdl-28222130

ABSTRACT

BACKGROUND: A majority infections caused by dengue virus (DENV) are asymptomatic, but a higher incidence of severe illness, such as dengue hemorrhagic fever, is associated with secondary infections, suggesting that pre-existing immunity plays a central role in dengue pathogenesis. Primary infections are typically associated with a largely serotype-specific antibody response, while secondary infections show a shift to a broadly cross-reactive antibody response. METHODS/PRINCIPAL FINDINGS: We hypothesized that the basis for the shift in serotype-specificity between primary and secondary infections can be found in a change in the antibody fine-specificity. To investigate the link between epitope- and serotype-specificity, we assembled the Dengue Virus Antibody Database, an online repository containing over 400 DENV-specific mAbs, each annotated with information on 1) its origin, including the immunogen, host immune history, and selection methods, 2) binding/neutralization data against all four DENV serotypes, and 3) epitope mapping at the domain or residue level to the DENV E protein. We combined epitope mapping and activity information to determine a residue-level index of epitope propensity and cross-reactivity and generated detailed composite epitope maps of primary and secondary antibody responses. We found differing patterns of epitope-specificity between primary and secondary infections, where secondary responses target a distinct subset of epitopes found in the primary response. We found that secondary infections were marked with an enhanced response to cross-reactive epitopes, such as the fusion-loop and E-dimer region, as well as increased cross-reactivity in what are typically more serotype-specific epitope regions, such as the domain I-II interface and domain III. CONCLUSIONS/SIGNIFICANCE: Our results support the theory that pre-existing cross-reactive memory B cells form the basis for the secondary antibody response, resulting in a broadening of the response in terms of cross-reactivity, and a focusing of the response to a subset of epitopes, including some, such as the fusion-loop region, that are implicated in poor neutralization and antibody-dependent enhancement of infection.


Subject(s)
Antibodies, Viral/immunology , Cross Reactions , Dengue Virus/immunology , Epitope Mapping , Antibodies, Monoclonal/immunology , Antibodies, Neutralizing/immunology , B-Lymphocytes/immunology , Databases, Factual , Dengue Virus/classification , Immunologic Memory , Protein Binding , Serogroup
5.
Microbiome ; 2: 31, 2014.
Article in English | MEDLINE | ID: mdl-25228989

ABSTRACT

BACKGROUND: Sample storage conditions, extraction methods, PCR primers, and parameters are major factors that affect metagenomics analysis based on microbial 16S rRNA gene sequencing. Most published studies were limited to the comparison of only one or two types of these factors. Systematic multi-factor explorations are needed to evaluate the conditions that may impact validity of a microbiome analysis. This study was aimed to improve methodological options to facilitate the best technical approaches in the design of a microbiome study. Three readily available mock bacterial community materials and two commercial extraction techniques, Qiagen DNeasy and MO BIO PowerSoil DNA purification methods, were used to assess procedures for 16S ribosomal DNA amplification and pyrosequencing-based analysis. Primers were chosen for 16S rDNA quantitative PCR and amplification of region V3 to V1. Swabs spiked with mock bacterial community cells and clinical oropharyngeal swabs were incubated at respective temperatures of -80°C, -20°C, 4°C, and 37°C for 4 weeks, then extracted with the two methods, and subjected to pyrosequencing and taxonomic and statistical analyses to investigate microbiome profile stability. RESULTS: The bacterial compositions for the mock community DNA samples determined in this study were consistent with the projected levels and agreed with the literature. The quantitation accuracy of abundances for several genera was improved with changes made to the standard Human Microbiome Project (HMP) procedure. The data for the samples purified with DNeasy and PowerSoil methods were statistically distinct; however, both results were reproducible and in good agreement with each other. The temperature effect on storage stability was investigated by using mock community cells and showed that the microbial community profiles were altered with the increase in incubation temperature. However, this phenomenon was not detected when clinical oropharyngeal swabs were used in the experiment. CONCLUSIONS: Mock community materials originated from the HMP study are valuable controls in developing 16S metagenomics analysis procedures. Long-term exposure to a high temperature may introduce variation into analysis for oropharyngeal swabs, suggestive of storage at 4°C or lower. The observed variations due to sample storage temperature are in a similar range as the intrapersonal variability among different clinical oropharyngeal swab samples.

6.
BMC Bioinformatics ; 13: 143, 2012 Jun 22.
Article in English | MEDLINE | ID: mdl-22726705

ABSTRACT

BACKGROUND: The concept of orthology is key to decoding evolutionary relationships among genes across different species using comparative genomics. QuartetS is a recently reported algorithm for large-scale orthology detection. Based on the well-established evolutionary principle that gene duplication events discriminate paralogous from orthologous genes, QuartetS has been shown to improve orthology detection accuracy while maintaining computational efficiency. DESCRIPTION: QuartetS-DB is a new orthology database constructed using the QuartetS algorithm. The database provides orthology predictions among 1621 complete genomes (1365 bacterial, 92 archaeal, and 164 eukaryotic), covering more than seven million proteins and four million pairwise orthologs. It is a major source of orthologous groups, containing more than 300,000 groups of orthologous proteins and 236,000 corresponding gene trees. The database also provides over 500,000 groups of inparalogs. In addition to its size, a distinguishing feature of QuartetS-DB is the ability to allow users to select a cutoff value that modulates the balance between prediction accuracy and coverage of the retrieved pairwise orthologs. The database is accessible at https://applications.bioanalysis.org/quartetsdb. CONCLUSIONS: QuartetS-DB is one of the largest orthology resources available to date. Because its orthology predictions are underpinned by evolutionary evidence obtained from sequenced genomes, we expect its accuracy to continue to increase in future releases as the genomes of additional species are sequenced.


Subject(s)
Algorithms , Computational Biology/methods , Databases, Genetic , Gene Duplication , Archaea/genetics , Bacteria/genetics , Biological Evolution , Eukaryota/genetics , Genomics/methods , Proteins/genetics
7.
Nucleic Acids Res ; 39(13): e88, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21572104

ABSTRACT

The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.


Subject(s)
Algorithms , Genes , Phylogeny , Gene Duplication , Genome, Bacterial , Genomics/methods , Sequence Alignment
8.
PLoS One ; 6(3): e17469, 2011 Mar 07.
Article in English | MEDLINE | ID: mdl-21408217

ABSTRACT

BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.


Subject(s)
Genome, Bacterial/genetics , Molecular Sequence Annotation/methods , Software , Base Sequence , Genes, Bacterial/genetics , Reproducibility of Results
9.
PLoS One ; 4(7): e6254, 2009 Jul 16.
Article in English | MEDLINE | ID: mdl-19606223

ABSTRACT

BACKGROUND: Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster. METHODOLOGY/PRINCIPAL FINDINGS: The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes. CONCLUSIONS: The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.


Subject(s)
Proteins/chemistry , Amino Acid Sequence , Cluster Analysis , Databases, Protein , Molecular Sequence Data , Programming Languages , Protein Conformation , Sequence Homology, Amino Acid , User-Computer Interface
10.
Proteins ; 74(2): 449-60, 2009 Feb 01.
Article in English | MEDLINE | ID: mdl-18636476

ABSTRACT

In this article, we present a new method termed CatFam (Catalytic Families) to automatically infer the functions of catalytic proteins, which account for 20-40% of all proteins in living organisms and play a critical role in a variety of biological processes. CatFam is a sequence-based method that generates sequence profiles to represent and infer protein catalytic functions. CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile-specific thresholds associated with a predefined nominal false-positive rate (FPR) of predictions. The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs: function annotation with high precision and hypothesis generation with moderate precision but better recall. Multiple tests of CatFam databases (generated with distinct nominal FPRs) against enzyme and nonenzyme datasets show that the method's predictions have consistently high precision and recall. For example, a 1% FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98.6% precision and 95.0% recall. Comparisons of CatFam databases against other established profile-based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and (in most cases) higher recall, and that (on average) CatFam provides 21.9% additional catalytic functions not inferred by the other similarly reliable methods. These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions. The CatFam databases and the database search program are freely available at http://www.bhsai.org/downloads/catfam.tar.gz.


Subject(s)
Algorithms , Databases, Protein , Sequence Analysis, Protein/methods , Animals , Catalysis , Cluster Analysis , Enzymes/genetics , Enzymes/metabolism , Genome , Humans , Metabolic Networks and Pathways , Protein Structure, Tertiary , Proteins/genetics , Proteins/metabolism , Reproducibility of Results , Structure-Activity Relationship
11.
BMC Bioinformatics ; 9: 52, 2008 Jan 25.
Article in English | MEDLINE | ID: mdl-18221520

ABSTRACT

BACKGROUND: Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. RESULTS: PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases.PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA. We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. CONCLUSION: The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources.


Subject(s)
Algorithms , Computational Biology/methods , Databases, Protein , Pattern Recognition, Automated/methods , Proteins/genetics , Proteins/physiology , Proteomics/methods , Amino Acid Sequence , Structure-Activity Relationship
12.
Nucleic Acids Res ; 34(Web Server issue): W626-31, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845086

ABSTRACT

The Onto-Tools suite is composed of an annotation database and eight complementary, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express and nsSNPCounter. Promoter-Express is a new tool added to the Onto-Tools ensemble that facilitates the identification of transcription factor binding sites active in specific conditions. nsSNPCounter is another new tool that allows computation and analysis of synonymous and non-synonymous codon substitutions for studying evolutionary rates of protein coding genes. Onto-Translate has also been enhanced to expand its scope and accuracy by fully utilizing the capabilities of the Onto-Tools database. Currently, Onto-Translate allows arbitrary mappings between 28 types of IDs for 53 organisms. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.


Subject(s)
Databases, Genetic , Polymorphism, Single Nucleotide , Promoter Regions, Genetic , Proteins/genetics , Software , Transcription Factors/metabolism , Binding Sites , Evolution, Molecular , Internet , Systems Integration , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...